Systems and methods of echo &amp; noise cancellation in voice communication

ABSTRACT

In an example, time and frequency domain speech enhancement is implemented on a platform having a programmable device, such a PC or a smartphone running an OS. Echo cancellation is done first in time domain to cancel a dominant portion of the echo. Residual echo is cancelled jointly with noise reduction during a subsequent frequency domain stage. The time domain block uses a dual band, shorter length Adaptive Filter for faster convergence. Non-linear residual echo is cancelled based on an echo estimate and an error signal from the adaptive filters. A controller locates regions that had residual echo suppressed and which do not have speech and injects comfort noise. The controller can be full-duplex and operate non-linearly. An AGC selectively amplifies the frequency bins, based on the Gain function used by the residual echo and noise canceller.

CROSS REFERENCE TO RELATED APPLICATIONS

This applications claims priority from U.S. Provisional Application No.61/697,682, entitled “SYSTEMS AND METHODS OF ECHO & NOISE CANCELLATIONIN VOICE COMMUNICATION”, which was filed on Sep. 6, 2012, and is herebyincorporated by reference in its entirety herein.

BACKGROUND

The present invention generally relates to improving quality of voicecommunication and more particularly to echo and noise cancellation inpacket-based voice communication systems.

Most VoIP vendors have of goal of to provide a generic VoIP solution forheterogeneous platforms, including platforms such as PCs and mobileplatforms. However, variation in platform requirements andcharacteristics make high performance and platform-generic and speechenhancement a difficult problem. For example, variation in echo pathpure delay, hardware non-linearity, and negative ERL, due to situationssuch as bad acoustic coupling, clock drift and so on pose difficulties.Full duplex voice communication presents difficulties as well. Stillother considerations are computation and power efficiency, andmaintaining stable performance and quality in a multitaskingenvironment, in which there may be variable computation resourceavailability.

SUMMARY

The following discloses methods and systems of echo cancellation thatmay find application across a wide variety of platforms. In one aspect,the proposed echo cancellation system uses dual band, shorter lengthtime domain Adaptive Filter (ADF) followed by a frequency domain speechenhancement system. The ADF works on two bands with appropriatede-correlation filter to speed up the convergence rate. The frequencydomain speech enhancement system includes a Residual Echo and NoiseCancellation System (RENC), a Non-linear Processor (NLP) controller anda Frequency domain Automatic Gain Controller (FAGC).

In an aspect, the residual echo from longer reverberation andnon-linearity is suppressed further jointly with noise cancellation. Ithas been found that a large part of the residual echo is correlated withacoustic echo estimate from the ADF. Canceling the residual echo as partof noise cancellation has been found to produce better results thanusing a spectral subtraction method with platform specific tunable gainparameters for individual frequency bins.

In one example implementation, a modified Wiener Filter is used tocancel both residual echo and noise jointly. In another example, amodified Minimum Mean-Square Error Log Spectral Amplitude (MMSE-LSA)cancels residual echo and noise together. In these examples, sinceresidual echo is canceled simultaneously with noise, additionalcomplexity specifically for the residual echo cancellation is reduced.

In some examples, the FAGC uses the frequency domain gain functionobtained from the residual echo canceller to produce a Voice ActivityDecision (VAD). The FAGC amplifies only speech frequency bins, so thatthe FAGC does not boost a noise signal embedded with the speech andprovides better voice quality.

The NLP Controller locates sample regions that have only residual echo(and not speech). These regions are processed by an Acoustic EchoSuppressor (AES), which replaces the signal in these regions withcomfort noise. In an example, to identify the residual echo aloneregion, NLP controller uses correlation between inputs including errorand microphone signal, error energy, microphone signal energy, and longterm average of reference signal amplitude, as described below. In theexample, the NLP controller activates non-linear processing on based ona plurality of decision parameters, and further based on a set ofpre-defined validation conditions.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 depicts a system context in which methods and systems accordingto the disclosure can be practiced;

FIG. 2 depicts an example architecture of an echo cancellation systemaccording to the disclosure;

FIG. 3 depicts an example architecture of a Residual Echo and NoiseCanceller (RENC) according to the disclosure;

FIG. 4 depicts an example architecture of a gain estimation block;

FIG. 5 depicts an example architecture of a Frequency domain AutomaticGain Controller (FAGC) according to the disclosure;

FIG. 6 depicts an example flow for a controller of a Non-LinearProcessor (NLP) used in the example echo cancellation architecture;

FIG. 7 depicts an example of NLP decision logic for a Non-LinearProcessor (NLP) used in the example echo cancellation architecture;

FIG. 8 depicts an ensemble average of ERLE without nearend party;

FIG. 9 depicts ensemble average of ERLE with nearend party active;

FIG. 10 depicts FAGC input and output signals and global gain for a tonesignal;

FIG. 11 depicts FAGC input and output signal power level for a tonesignal;

FIG. 12 depicts FAGC input and output signals and global gain for aspeech signal;

FIG. 13 depicts FAGC input and output signal power level for a speechsignal;

FIG. 14 depicts NLP decisions on an Echo Suppressor (ES) input signal;and

FIG. 15 depicts ES output (AES input) and AES output signals.

DETAILED DESCRIPTION

This disclosure includes sections relating to an example high levelarchitecture of speech enhancement system, details of an exampleResidual Echo and Noise Cancellation (RENC) system, details of anexample Automatic Gain Controller (FAGC), details of a proposed NLPcontroller and performance examples of the proposed speech enhancementsystem for real-time captured test signals.

FIG. 1 depicts a system context in which methods and systems accordingto the disclosure can be practiced. FIG. 1 depicts a situation in whichdevices (device 20 and device 45) support voice communication overpacket networks (e.g., network 40), and in a particular example, wheredevices support Voice over Internet Protocol (VoIP). User satisfactionwith voice communication is degraded greatly by echo. To providecontext, echo can be viewed as a situation in which a far-end signal(13) from a far-end device 45 is being played from a speaker at a nearend (12) device 20 (this signal can include voice from a person atdevice 45, noise 16, and echo derived from near-end 12 speech played outat a speaker of device 45. A microphone 23 at near end device 20 samplesthe available audio energy, including picks up the far-end signal,encodes some part of the far-end signal and returns it to the far-enddevice 45 (such as in voice packets 34 and 35), which produces audiothrough a speaker, including noise and the echoed far-end signal pickedup at near end 12. note that near-end and far-end here are simplyconventions which would change based on perspective; in a full-duplexconversation, they are interchangeable.

By further explanation, device 20 (and device 45) may include a display22, a speaker 24, a non-volatile storage 25, a volatile memory hierarchy26, and one or more processors 27. These components can execute an echocancellation system according to these disclosures.

Overview of Echo Cancellation System

A high level architecture of an example echo and noise cancellationsystem is shown in FIG. 2. The input signals to Acoustic Echo Canceller(AEC) 102 are the microphone signal d(n) and the farend signal x(n)being played out through speaker (signals having n as an argument aredigital versions of a time-domain signal. The system contains a BandPass Filter(BPF) 107, Band Splitters 113, De-Correlation Filters (DCFs)129, 131, Adaptive Filters (ADFs) 123, 125, Band Mixers 115, 117,Residual Echo & Noise Canceller (RENC) 119, NLP controller 109, andAcoustic Echo Suppressor (AES) 111. Aspects presented herein includeexample designs for a high performance simultaneous noise & residualecho cancellation unit, an example design of a full-duplex NLPcontroller and an example design of an efficient frequency domain gaincontrol unit.

The example system contains two delay compensation units: pure delaycompensation and delay compensation with respect to a microphone signal,in order to synchronize the microphone signal with RENC output signal.The pure delay can be estimated using an ADF running in decimateddomain. The estimation of pure delay is configurable. In an example, thealgorithmic delay of Residual and Noise Cancellation (RENC) unit is 6ms, so that a compensation delay is introduced to the microphone signalof about that amount to align with residual echo and Noise Cancelleroutput signal.

Band Pass Filter (BPF)

It is to remove the DC and unwanted high frequency signal from theinputs. The cut-off frequencies of this filter are 0.0125 and 0.96. A6th order IIR filter is used because of its simplicity and lowprocessing requirement.

Band Splitter

It is to split the signal into two channels. Band splitter usesQuadrature Mirror Filter (QMF) filter for band splitting. For thetwo-bands of AEC processing, the input signal is split into 2 channelswith a cut-off frequency of π/2. The sampling rate of each channel isreduced to half of the original sampling rate using decimation factor of2. This sample rate reduction provides efficient processing of AEC.

De-Correlation Filter (DCF)

To avoid degradation of the performance of NLMS algorithms due to strongcorrelation of the speech signals, the farend signal is pre-whitened byapplying a de-correlation filter before giving it to adaptive filter.De-correlation filter is a prediction error first order HPF, with itscoefficient matched to the correlation properties of the speech signal.This filtering increases the rate of convergence of the adaptive filterfor the speech signal. The typical value of filter co-efficient is0.875.

Adaptive Filer (ADF)

Adaptive filter (ADF) uses delayed error NLMS algorithm. Since thefilter is running in decimated and de-correlated domain with shorterfilter length, the convergence of the filter is very faster. The maximumnumber of taps used per filter is 256. Each ADF has its own built-innear-end speech detector that activates/de-activates the weightadaptation.

Band Mixer

It is to combine echo estimates and error signals from the two bandsafter AEC processing to their single bands respectively. Echo estimatesand error signals are up-sampled before combining by the synthesisfilter bank into an original sampling rate signal. The combinedstructure for splitting the channels and combining again is called aQuadrature-Mirror Filter (QMF) bank.

Band Mixer 115, 117 outputs e(n) and y(n) are passed to RENC 119, whichas will be described below, further suppresses echo and backgroundnoise. RENC 119 also has an AGC 121. The RENC 119 outputs signalsincluding s(n) through AGC 120 (see FIG. 2) to AES 111 and s′(n) to NLPcontroller 109. s(n) is enhanced nearend signal after canceling residualecho and background noise. The signal s′(n) is an output of FAGC.

Since NLP controller 109 uses correlation between error and microphonesignal, the output signal obtained before FAGC's action is given to it.The FAGC output is given to AES unit for further processing to eliminateunwanted very low level residual echo. The AES is controlled based onNon-linear Processor (NLP) decisions.

NLP controller 109 enables or disables Non-Linear Processing (NLP), andAES, as being part of NLP. NLP can completely remove the residual echoduring single talk. The NLP decision also can ensure no signal clippingwhen passing from single talk to double-talk. The NLP controller 109responds quickly, without hangover during start of near end signalpresent in the output of the microphone signal, this unit also can becalled a Sensitive Double-Talk Detector (SNS DTD).

Acoustic Echo Suppresser (AES) 111 is a switched attenuator. AEScomprises a noise parameter extractor and a Comfort Noise Injectionunits (CNI). During single talk, AES replaces residual echo by comfortnoise generated by CNI unit. AES provides a smooth transition betweenthe original signal and the comfort noise generated by CNI module at thebeginning of single talk, as well as ensuring a smooth transition whenmoving from single talk to nearend speech or nearend background noise.For this seamless transition, AES performs Overlap and Add (OLA) using atriangular window on CNI generated noise and enhanced nearend signals(n) from FAGC at the start of single talk and also at the end of singletalk. During start of the single talk, CNI generated noise is multipliedby a rising ramp and is added to the s(n) multiplied by a falling ramp.Similarly, during end of the single talk, CNI generated noise ismultiplied by a falling ramp and is added to s(n), which is multipliedby a rising ramp. In an example, the attenuation or rising factor of theramp is 0.3662 over a 10 ms period.

The AGC output, s(n) is classified into Speech and noise frames. In anexample, each frame is 10 ms in length. The classification uses anenergy-based VAD algorithm. Average ambient noise level and LinearPredictive Coefficients (LPC) are extracted for each silence/noiseframe.

CNI unit uses 10th order LPC parameters and Gaussian random numbergenerator for generating comfort noise, which is used for matching thespectrum of nearend ambient noise. This simulated comfort noise replacesthe residual echo without a noticeable transition (observable by user),when NLP is activated.

Residual Echo and Noise Canceller

A block diagram of an example RENC 119 is shown in FIG. 3. The exampleRENC 119 uses modified Frequency domain wiener filtering or an MMSE LSAestimator. In brief, the summation of an estimate of short-term spectralmagnitude of ambient background noise and an estimate of short-termspectral magnitude of echo is used to estimate a spectral gain to beapplied to an error signal, which includes residual echo, noise, andpotentially, near end speech.

Assuming that the noise, v(n), is additive to near-end speech signals(n) at respective discrete time indexes, denoted by the variable n, thenoisy near-end speech signal d(n) is represented in equation (1).

d(n)=s(n)+v(n)  (1)

Error signal e(n) from Band Mixer will contain noisy near-end speechd(n) and residual echo ry(n), as denoted in equation (2).

$\quad\begin{matrix}\begin{matrix}{{e(n)} = {{d(n)} + {{ry}(n)}}} \\{= {{s(n)} + {v(n)} + {{ry}(n)}}}\end{matrix} & (2)\end{matrix}$

Windowing

An asymmetric trapezoidal window represented in equation (3). Where, Dis the overlap length, L is the input frame length and M is the windowlength. Incoming samples are stored in a buffer of length L samples;last D samples from the previous frame are appended to this buffer andremaining samples are taken as zeros to make up for a buffer of lengthequal to window length M. In one example, the value of M is 176 samples,L is 80 samples and D is 48 samples. Buffered samples are windowed usingtrapezoidal window and then transformed into frequency domain forprocessing to reduce the jitter in packet transmission of packet basedcommunication system such as VoIP.

$\begin{matrix}{{w(n)} = \left\{ \begin{matrix}{{\sin^{2}\left( {{{\pi \left( {n + 0.5} \right)}/2}D} \right)},} & {{0 \leq n < D},} \\{1,} & {{D \leq n < L},} \\{{\sin^{2}\left( {{{\pi \left( {n - L + D + 0.5} \right)}/2}D} \right)},} & {{L \leq n < {D + L}},} \\{0,} & {{{D + L} \leq n < M},}\end{matrix} \right.} & (3)\end{matrix}$

Frequency Domain Conversion: The error signal e(n) and scaled echoestimate r′y(n) are divided into overlapping frames by the applicationof a trapezoidal window function where r′ is a fixed correlation factor.The respective windowed signals are converted to frequency domain usingFourier Transform 160 161 (e.g., a Short-Time Fourier transform (STFT).

Let E_(k)(l) and r′Y_(k)(l) represent STFT of error signal e(n) and thescaled echo estimate r′y(n) respectively for the frame index l andfrequency bin index k. Then error signal is given as

E _(k)(l)=S _(k)(l)+V _(k)(l)+Y _(k)(l)  (4)

Where, S_(k)(l), V_(k)(l) and Y_(k)(l) represent STFT of nearend signal,s(n), the background noise, v(n) and residual echo y(n).

Reverberation Tracker

Since the AEC tail length used is short, it may not cancel the echoescompletely when actual reverberation is longer than the tail length ofthe echo cancellation filters. So, to cancel them, a moving averagefilter with low attack rate and fast release rate is used on actual echoestimate obtained from echo cancellation filter. The estimation frommoving average filters is controlled using appropriate logic when actualreverberation is within the tail length of echo cancellation filter.Equation 5 represents lengthen echo estimate R_(k)(l).

$\begin{matrix}{{R_{k}(l)} = \left\{ \begin{matrix}{{{\alpha_{1}{R_{k}\left( {l - 1} \right)}} + {\left( {1 - \alpha_{1}} \right)r^{\prime}{Y_{k}(l)}}},} \\{{if}\mspace{14mu} \left( {{r^{\prime}{Y_{k}(l)}} > {R_{k}\left( {l - 1} \right)}} \right)} \\{{{\alpha_{2}{R_{k}\left( {l - 1} \right)}} + {\left( {1 - \alpha_{2}} \right)r^{\prime}{Y_{k}(l)}}},{else}}\end{matrix} \right.} & (5)\end{matrix}$

-   -   Where, α₂<α₁<1.

Noise Estimation

Noise estimation uses external VAD. The VAD identifies presence of voiceactivity in the input error signal coming from ADF. When the VADdecision shows noise frame (i.e., VAD=0), noise estimation V_(k)(l) isupdated as per equation (6).

$\begin{matrix}{{V_{k}(l)} = \left\{ \begin{matrix}{{\alpha_{3}{V_{k}\left( {l - 1} \right)}} + {\left( {1 - \alpha_{3}} \right){V_{k}(l)}}} & {{{if}\mspace{14mu} {VAD}} = 0} \\{V_{k}\left( {l - 1} \right)} & {otherwise}\end{matrix} \right.} & (6)\end{matrix}$

Cancellation Part

The total signal that is suppressed from error signal in frequencydomain at all frequency bins for a given frame l is given as

NR _(k)(l)=V _(k)(l)+R _(k)(l)  (7)

Estimation Controller

Even though equation (7) represents the unwanted components that are tobe subtracted from error signal, there is a chance of over estimationpossible in different platforms. This over estimation can be due to r′value being greater than the ratio between actual residual echo and theecho estimate. To control the over estimate, moving average of errorsignal can be estimated using low pass filtering with dual αcoefficient, such as in equation (8).

$\begin{matrix}{{W_{k}(l)} = \left\{ \begin{matrix}{{\alpha_{4}{W_{k}\left( {l - 1} \right)}} + {\left( {1 - \alpha_{4}} \right){E_{k}(l)}}} \\{{if}\mspace{14mu} \left( {{W_{k}\left( {l - 1} \right)} > {E_{k}(l)}} \right)} \\\; \\{{\alpha_{5}{W_{k}\left( {l - 1} \right)}} + {\left( {1 - \alpha_{5}} \right){E_{k}(l)}}} \\{{if}\mspace{14mu} \left( {{W_{k}\left( {l - 1} \right)} \leq {E_{k}(l)}} \right)}\end{matrix} \right.} & (8)\end{matrix}$

To control the over estimation of cancellation part NR_(k)(l), a ceilingoperation is performed and modified cancellation part is estimated asgiven in equation (9).

P _(k)(l)=min(NR _(k)(l),W _(k)(l))  (9)

The example RENC 119 filters out the cancellation part by modifying thespectral amplitudes of each frequency bins |E_(k)(l)| in equation (4) byapplying the gain estimates G_(k)(l) as below

S _(k)(l)=G _(k)(l)E _(k)(l),for 0≦G _(k)(l)≦1  (10)

Gain estimates G_(k)(l) is formed as a function of a posteriori SNRγ_(k)(l), and a priori SNR ξ_(k)(l). The γ_(k)(l) and ξ_(k)(l) areestimated as below using statistical variances of error signal or theexpected clean near-end speech and the cancellation part signal.

$\begin{matrix}{{\gamma_{k}(l)} \equiv \frac{{{E_{k}(l)}}^{2}}{E\left( {{P_{k}(l)}}^{2} \right)}} & (11) \\{{\xi_{k}(l)} \equiv \frac{E\left( {{S_{k}(l)}}^{2} \right)}{E\left( {{P_{k}(l)}}^{2} \right)}} & (12)\end{matrix}$

The statistical variance of clean near-end speech E(|S_(k)(l)|²) for theestimation of ξ_(k)(l) is estimated using Decision-Directed (DD) method[1] proposed by Ephraim and Malah using 0<α<1 and is as.

$\begin{matrix}{{\xi_{k}(l)} = {{\alpha \frac{{{S_{k}\left( {l - 1} \right)}}^{2}}{E\left( {{P_{k}(l)}}^{2} \right)}} + {\left( {1 - \alpha} \right){{MAX}\left( {{{\gamma_{k}(l)} - 1},0} \right)}}}} & (13)\end{matrix}$

FIG. 4 shows a block diagram of a gain estimator 175. The formation ofG_(k)(l) function is done using: (1) Frequency domain wiener filteringor (2) MMSE LSA estimator.

Frequency Domain Weiner Filtering: The Wiener filter is a popularadaptive technique that has been used in many enhancement methods.Approach based on optimal filtering and the aim is to find the optimalfilter that would minimize the mean square error between the desiredsignal (clean signal) and the estimated output. The Wiener filter gainG_(k)(l) is estimated by solving an equation in which the derivative ofthe mean square error with respect to the filter coefficients is set tozero:

$\begin{matrix}{{G_{k}^{W}(l)} = \frac{\xi_{k}(l)}{{\xi_{k}(l)} + 1}} & (14)\end{matrix}$

The Wiener filter emphasizes portions of the spectrum where the SNR ishigh, and attenuates portions of the spectrum where the SNR is low.Iterative Wiener filtering constructs an optimal linear filter usingestimates of both the underlying speech and underlying noise spectra.

Minimum Mean-Square Error Log Spectral Amplitude (MMSE-LSA):

This technique makes an assumption that Fourier expansion coefficientsof noise components (V_(k)(l) and RY_(k)(l)) and near-end speech arestatistically independent, and that they follow a Gaussian distribution.Log-spectra is used in distortion measures, and is motivation to examinethe effect of an amplitude estimator constrained to minimizingmean-squared error of the log-spectra. Let A_(k) be the actual amplitudeof the near-end speech signal and Ā_(k) be the estimated amplitude ofthe near-end speech signal. The cost function used to estimate the gainis given by

E{(log A _(k)−log Ā _(k))²}  (15)

The gain function is given by the equation (16),

$\begin{matrix}{{G_{k}^{LSA}(l)} = {\frac{\xi_{k}(l)}{1 + {\xi_{k}(l)}}\exp \left\{ {\frac{1}{2}{\int_{v_{k}}^{\infty}{\frac{^{- t}}{t}{t}}}} \right\}}} & (16)\end{matrix}$

Since the estimation of integral function over the exponential ofequation (16) is very complex, the exponential integral in (16) can beevaluated using a functional approximation shown in equation 17.

$\begin{matrix}{{G_{k}^{LSA}(l)} = {\left( \frac{\xi_{k}(l)}{1 + {\xi_{k}(l)}} \right){\exp \left( \frac{^{v_{k}{(l)}}}{2} \right)}}} & (17)\end{matrix}$

Where, v_(k)(l) and e^(v) ^(k) ^((l)) are defined in the followingequations (18) and (19) respectively.

$\begin{matrix}{{v_{k}(l)} = {\frac{\xi_{k}(l)}{1 + {\xi_{k}(l)}}{\gamma_{k}(l)}}} & (18) \\{^{v_{k}{(l)}} = \left\{ \begin{matrix}{{{{- 2.31}\mspace{14mu} \log_{10}{v_{k}(l)}} - 0.6},} & {{v_{k}(l)} < 0.1} \\{10^{- {({{0.52\; {v_{k}{(l)}}} + 0.26})}},} & {{v_{k}(l)} > 1} \\{{{{- 1.54}\mspace{14mu} \log_{10}{v_{k}(l)}} + 0.166},} & {otherwise}\end{matrix} \right.} & (19)\end{matrix}$

Gain Smoothing

To avoid abrupt change across the frequency bins, gain smoothing is doneas below.

$\begin{matrix}{{G_{k}(l)} = \left\{ \begin{matrix}{{\alpha_{5}{G_{k}^{W}\left( {l - 1} \right)}} + {\left( {1 - \alpha_{5}} \right){G_{k}^{W}(l)}}} & {{if}\mspace{14mu} \left( {{G_{k}^{W}\left( {l - 1} \right)} > {G_{k}^{W}(l)}} \right)} \\{{\alpha_{6}{G_{k}^{W}\left( {l - 1} \right)}} + {\left( {1 - \alpha_{6}} \right){G_{k}^{W}(l)}}} & {{if}\mspace{14mu} \left( {{G_{k}^{W}\left( {l - 1} \right)} \leq {G_{k}^{W}(l)}} \right)} \\{G_{k}^{W}(l)} & {{if}\mspace{14mu} \left( {l < T} \right)}\end{matrix} \right.} & (20)\end{matrix}$

2D Filtering: To smooth abrupt change in gain estimation across thefrequency bins, smoothing is done as below.

G _(k) ^(F)(l)=(α₇ G _(k)(l−1)+α₈ G _(k)(l))*(1/(α₇+α₈))  (21)

-   -   for (k>1)

TABLE 1 Constants used by RENC 119 Constant Value Remarks α₁ 0.61Reverberation Tracker α₂ 0.21 smoothing factor α₃ 0.13 Noise estimationsmoothing factor α₄ 0.61 Estimation Controller α₅ 0.21 smoothing factorα 0.98 Decision Directed smoothing factor α₅ 0.98 Gain estimation α₆0.28 smoothing factor α₇ 7 2D filtering α₈ 1 smoothing factor r′ 2.8Expected ratio between residual echo and echo estimate T 40 Initial 40frames L 80 Frame size of 10 msec D 48 Overlap size of 6 msec M 176Window length

Overlap and Add (OLA)

The estimated Gain is applied on error signal as per equation (10) andthe enhanced STSA S_(k)(l) is obtained. Enhanced near-end speech s(n) isthen reconstructed by applying the inverse FFT to the enhanced STSA,|S_(k)(l)|, with the noisy phase E_(k)(l), followed by an appropriateoverlap-and-add (OLA) procedure to compensate for the window effect andto alleviate abrupt signal changes between two consecutive frames.

Frequency Domain Automatic Gain Controller (FAGC)

The smoothed gain G_(k) ^(F)(l), and enhanced speech frequency binsS_(k)(l) are used for estimating gain for each frequency bin to achievetarget power level in the output. The high level architecture of theproposed AGC is shown in FIG. 4. VAD block estimates presence of voiceactivity for each frequency bin. If voice activity presence is detectedat least on one frequency bin, the new gain is estimated by thecomputation module. Then the new gain is applied on the enhanced speechS_(k)(l).

Voice Activity Detection (VAD)

Since calculating AGC gain for the silence frames is not needed,classification of a frame as speech/silence is required for gaincalculations. Since, AGC is supposed to apply gain only on the nearendsignal, it should not amplify echo or noise regions. So, the suppressorgain G_(k) ^(F)(l) is expected to be lower than unity for echo and noiseregions. Also, the suppressor gain can be used for deciding the presenceof nearend speech activity, as below.

bvad _(k)(l)=1 if (G _(k) ^(F)(l)>λ₁)

vad(l)=1

if (bvad _(k)(l)==1),for any k  (22)

Where bvad_(k)(l) represents VAD decision for k^(th) frequency bin inl^(th) frame. vad(l) represents global VAD decision for l^(th) frame.

The decision of VAD-activity for individual bins in a given frame areconsidered and if more than one bin is classified as a speech bin theframe is classified as a speech frame otherwise as silence frame.

Gain Computation Unit

The Gain Computation Unit estimates global frame gain from the RMS powerlevel of nearend speech. The gain for each frequency bin is estimatedusing global frame gain G^(M)(l) and low pass filtering. Total speechpower level is given by

P _(p)(l)=Σ(S _(k) ²(l)*bvad _(k)(l))  (23)

Similarly, noise power is estimated as

P _(n)(l)=Σ(S _(k) ²(l)−P _(sp)(l))  (24)

Global frame gain is estimated as given below,

$\begin{matrix}{{G_{r}^{M}(l)} = {\frac{1}{\sqrt{{msqr}(l)}}*({TL})}} & (25)\end{matrix}$

Where, TL is calibrated target power level considering the frame sizeand spectral leakage during windowing for the given actual target levelin dB. Initial mean square value msqr(0) is given by equation (26).

msqr(0)=(TL*TL)  (26)

Mean square values (msqr(l)) are estimated using a LPF as given below

msqr(l)=msqr(l−1)+P′ _(m)(l)  (27)

Where, P′_(m)(l) is given by equation (27), and P_(m)(l) is given byequation (28).

$\begin{matrix}{{{tmp} = {{P_{m}(l)} - {{msqr}\left( {l - 1} \right)}}}{{P_{m}^{\prime}(l)} = \left\{ \begin{matrix}{{P_{m}(l)}*\lambda_{2}} & {{if}\mspace{14mu} \left( {{tmp} > 0} \right)} \\{{P_{m}(l)}*\lambda_{3}} & {otherwise}\end{matrix} \right.}} & (28) \\{{P_{m}(l)} = {{P_{sp}(l)} + {{P_{n}(l)}*\lambda_{4}}}} & (29)\end{matrix}$

The calculated gain is limited to the range of the allowable maximum andminimum values before applying it to the frames. In a case where lowamplitude to high amplitude level transition is encountered in theinput, the computed gain may exceed the limit and may cause a momentarytransition spike. This phenomenon can be minimized through a conditionto check gain blow over, by limiting the gain to a maximum gain valueG_(MAX) to avoid any spiking and ensure smooth transition.

$\begin{matrix}{{G_{r}^{M}(l)} = \left\{ \begin{matrix}G_{MAX} & {{{if}\mspace{14mu} {G_{r}^{M}(l)}} > G_{MAX}} \\G_{MIN} & {{{if}\mspace{14mu} {G_{r}^{M}(l)}} < G_{MIN}}\end{matrix} \right.} & (30)\end{matrix}$

To avoid high fluctuations between two frames that will result in signaldistortion the gain is smoothed over time and is given below.

$\begin{matrix}{{{tmp} = {{G_{r}^{M}(l)} - {G^{M}\left( {l - 1} \right)}}}{{G^{M}(l)} = \left\{ \begin{matrix}{{G^{M}\left( {l - 1} \right)} + {{tmp}*\lambda_{5}}} & {{if}\mspace{14mu} \left( {{tmp} > 0} \right)} \\{{G^{M}\left( {l - 1} \right)} + {{tmp}*\lambda_{6}}} & {otherwise}\end{matrix} \right.}} & (31)\end{matrix}$

Different smoothing factors are applied for transitions from noise tospeech and speech to noise respectively. These values are chosen in sucha way that the attack time is faster than the release time. Attack Timeshould be fast for preventing harsh distortion when the amplituderapidly increases and the decay Time should be relatively longer toavoid chopper effect to assure low distortion.

The computed gain is applied to speech and noise bins separately basedon the VAD activity decision for each bin. To avoid distortion acrossfrequency bins due to high gain differences across neighboring frequencybins, 2-D filtering on individual VAD decisions of each frequency bin isapplied.

$\begin{matrix}{{{bvad}_{k}^{2\; d}(l)} = \left\{ \begin{matrix}1 & {{if}\mspace{14mu} \left( {{{bvad}_{i}(l)}==1} \right.} \\{i = {k - {1\mspace{14mu} {or}\mspace{14mu} k}}} & {{{or}\mspace{14mu} k} + 1}\end{matrix} \right.} & (32)\end{matrix}$

With the knowledge of voice activity for each frame, individual framesare treated separately for the gain calculation. Gain to unvoicedportions that contain only background noise is set to unity. The AGCgain calculated for a given frame is given below for speech frequencybins bvad_(k) ^(2c)(l).

$\begin{matrix}{{{tmp} = {{G^{M}(l)} - {G_{k}^{AGC}\left( {l - 1} \right)}}}{{G_{k}^{AGC}(l)} = \left\{ \begin{matrix}{{G_{k}^{AGC}\left( {l - 1} \right)} + {{tmp}*\lambda_{7}}} & {{if}\mspace{14mu} \left( {{tmp} > 0} \right)} \\{{G_{k}^{AGC}\left( {l - 1} \right)} + \left( {{tmp}*\lambda_{8}} \right)} & {otherwise}\end{matrix} \right.}} & (33)\end{matrix}$

If bvad_(k) ^(2d)(l) is noise, below equation is estimated for AGC gain(G_(k) ^(AGC)(l)).

$\begin{matrix}{{G_{k}^{AGC}(l)} = \left\{ \begin{matrix}{{G_{k}^{AGC}\left( {l - 1} \right)}*\lambda_{9}} & {{if}\mspace{14mu} \begin{pmatrix}\left. \left( {{G_{k}^{AGC}\left( {l - 1} \right)} > 1} \right) \right.|| \\\left( {{G_{k}^{AGC}\left( {l - 1} \right)} > {G^{M}(l)}} \right)\end{pmatrix}} \\{G_{k}^{AGC}\left( {l - 1} \right)} & {otherwise}\end{matrix} \right.} & (34)\end{matrix}$

Finally, the computed gain is applied to respective frequency bins ofenhanced speech coming out of residual echo suppressor.

S′ _(k)(l)=G _(k) ^(AGC)(l)*S _(k)(l)  (35)

After gain multiplication on frequency domain, the frame is inversetransformed and the segments are put in order by overlap and add method(OLA) discussed in earlier sections.

TABLE 2 Constants used by FAGC Constant Value Remarks λ₁ 0.732 VADdecision factor for each bin λ₂ 0.793 Multiplication factor λ₃ 0.183Multiplication factor λ₄ 0.5 Multiplication factor to Noise powerG_(MAX) 8 Gain Limitation G_(MIN) 0.00015 λ₃ 32 Global Gain λ₄ 0.6Smoothing factor λ₅ 0.457 AGC gain λ₆ 0.793 Smoothing factors λ₇ 0.996AGC Gain limiter

Non-Linear Processor (NLP) Controller

FIGS. 6 and 7 depict example aspects of NLP control and NLP decisionlogic (which is used in NLP control), which are performed in NLPcontroller 109. NLP controller 109 enables or disables NLP to completelyremove the residual echo during single talk. Also, it is a goal toensure no signal clipping occurs while passing from single talk todouble-talk and vice versa. The NLP decisions are made from thecombination of normalized correlation between modified microphone signaland enhanced error signal by power of microphone signal and thenormalized correlation between modified microphone signal and enhancederror signal by power of error signal.

NLP controller 109 outputs NLP decisions for discrete time intervals,nlp(n). NLP controller 109 uses several inputs in producing NLPdecisions. The production of these inputs is collectively referred to asdecision parameters estimation 305. These inputs include correlationbetween error signal and microphone signal, ed_(enr)(n). Thiscorrelation also can be used for echo detection, such that ed_(enr)(n)also can be used as an indication of echo. Other inputs include,normalization parameters, such as error energy e_(enr)(n), andmicrophone signal energy d_(enr)(n), noise energy v_(enr)(n),convergence indicator conv(n), long term average of reference signalamplitude ly(n), absolute value of error signal, e_(abs)(n), andabsolute value of modified microphone signal. NLP also uses counters forstability checking. These counters include counts for hangover. Beforestarting NLP decision making, hangover counts and NLP decisionparameters are set as given below.

nlp(n)=0

distorsion(n)=0

st_hngovr(n)=st_hngovr(n−1)

dt_hngovr(n)=dt_hngovr(n−1)

nlp _(enr)(n)=nlp _(enr)(n−1)  (36)

The input signals (microphone signal and error signal) to the NLPcontroller 109 are scaled to avoid saturation in computation using16-bit registers. The scaling factor can be experimentally determined.The scaled down signals are called modified microphone signal d′(n) andenhanced error signal e_(n)(n), and respectively are estimated by belowequation (37).

d′(n)=d(n−D ₁)/16

e _(n)(n)=s′(n)/16  (37)

Cross correlation ed_(enr)(n) between modified microphone signal d′(n)and enhanced error signal e_(n)(n) is called echo indicator parameterand is a major parameter deciding NLP activation/de-activation (decisionto activate, not activate or deactivate). This parameter is estimated asbelow

$\begin{matrix}{{{ed}_{enr}(n)} = {{{ed}_{enr}\left( {n - 1} \right)} - \left( {{d^{\prime}\left( {n - K} \right)}*{e_{n}\left( {n - K} \right)}} \right) + \left( {{d^{\prime}(n)}*{e_{n}(n)}} \right)}} & (38)\end{matrix}$

Other important parameters include normalization factors, includingmicrophone energy d_(enr)(n) and enhanced error energy e_(enr)(n), andcan be estimated as in equation (39)

$\begin{matrix}{{{d_{enr}(n)} = {{d_{enr}\left( {n - 1} \right)} - \left\lbrack {{d^{\prime}\left( {n - K} \right)}*{d^{\prime}\left( {n - K} \right)}} \right\rbrack + \left( {{d^{\prime}(n)}*{d^{\prime}(n)}} \right)}}\mspace{20mu} {{e_{enr}(n)} = {{e_{enr}\left( {n - 1} \right)} - \left\lbrack {{e_{n}\left( {n - K} \right)}*{e_{n}\left( {n - K} \right)}} \right\rbrack + \left( {{e_{n}(n)}*{e_{n}(n)}} \right)}}} & (39)\end{matrix}$

Noise energy is another decision parameter that is used mainly forbreaking hangover. Noise energy is estimated using a moving averagefilter as per (40).

v _(enr)(n)=v _(enr)(n−1)+β₁(e _(enr)(n)−v _(enr)(n−1))

if (e _(enr)(n)>v _(enr)(n−1))

v _(enr)(n)=v _(enr)(n−1)+β₂(e _(enr)(n)−v _(enr)(n−1))  (40)

otherwise

There are five counters used for stability and other purposes. Startupindicator counter m_cnt(n) is used to indicate initial session timing.This counter also indicates a number of samples processed by theproposed system before ADF convergence is achieved. This counter'smaximum value is limited by the register length being used to avoidoverflow.

m _(—) cnt(n)=m _(—) cnt(n)+1

if (m _(—) cnt(n)<β₃)  (41)

Another counter counts recent noise frames. This counter uses VADdecisions (VAD(l)) from RENC 119.

$\begin{matrix}{{{v\_ cnt}(l)} = \left\{ \begin{matrix}{0,} & {{if}\mspace{14mu} \left( {{{VAD}(l)}==1} \right)} \\{{{{v\_ cnt}\left( {l - 1} \right)} + 1},} & {else}\end{matrix} \right.} & (42)\end{matrix}$

Another counter is an adaptation counter adp_cnt(n) used to indicate anumber of samples, during which the ADFs have maintained convergence.Adaptation counter allows taking hard NLP decisions during start ofconvergence. After ADF convergence, the adaptation counter does notfactor into NLP decision logic.

$\begin{matrix}{{{adp\_ cnt}(n)} = \left\{ \begin{matrix}{{{{adp\_ cnt}\left( {n - 1} \right)} + 1},} & {{if}\mspace{14mu} \left( {{{ADAP}(n)}==1} \right)} \\{{{adp\_ cnt}\left( {n - 1} \right)},} & {else}\end{matrix} \right.} & (43)\end{matrix}$

Another counter is suppressor activated counter, sup_cnt(n), which issimilar to the startup indicator counter m_cnt(n). Suppressor activatedcounter is to indicate a number of samples during which the NLP isactivated before convergence of the ADF. This counter is incremented byone for every NLP ON decision before convergence is achieved for aspeech frame. The suppressor activated counter also does not have factorinto NLP decision logic after ADF convergence. Balance convergencecounter, con_cnt(n), is to indicate the number of samples ADFs areconverged within the expected convergence.

The last counter used is called hist counter, his_cnt(n) is to check thestability of the convergence. Another decision parameters, absoluteshort term average error signal e_(abs)(n), absolute short term averagemicrophone signal d_(abs)(n) and long term average of reference signalamplitude ly(n) are estimated as per below equations.

$\begin{matrix}{{{tmp} = {{{s^{\prime}(n)}} - {e_{abs}\left( {n - 1} \right)}}}{{e_{abs}(n)} = \left\{ \begin{matrix}{{e_{abs}(n)} + {{tmp}*\beta_{4}}} & {{if}\mspace{14mu} \begin{pmatrix}{\left( {{{d\left( {n - D_{1}} \right)}} < \beta_{5}} \right)\&\&} \\\left( {{d_{abs}(n)} < {{d\left( {n - D_{1}} \right)}}} \right)\end{pmatrix}} \\{{{e_{abs}(n)} + {{tmp}*\beta_{6}}},} & {otherwise}\end{matrix} \right.}} & (44) \\{{{tmp} = {{{d\left( {n - D_{1}} \right)}} - {d_{abs}\left( {n - 1} \right)}}}{{d_{abs}(n)} = \left\{ \begin{matrix}{{d_{abs}(n)} + {{tmp}*\beta_{4}}} & {{if}\mspace{14mu} \begin{pmatrix}{\left( {{{d\left( {n - D_{1}} \right)}} < \beta_{5}} \right)\&\&} \\\left( {{d_{abs}(n)} < {{d\left( {n - D_{1}} \right)}}} \right)\end{pmatrix}} \\{{{d_{abs}(n)} + {{tmp}*\beta_{6}}},} & {otherwise}\end{matrix} \right.}} & (45) \\{{{ly}(n)} = {\left( {{{ly}\left( {n - 1} \right)}*\left( {1 - \beta_{7}} \right)} \right) + \left( {{{x_{2}(n)}}*\beta_{7}} \right)}} & (46)\end{matrix}$

D₁ is a delay compensator factor for synchronizing microphone signald(n) and error signal received from residual echo remover ś(n).

Another decision parameter is a convergence indicator and can beestimated (detection 307) as per pseudocode (47). When the ADF reachesconvergence during single talk, the correlation between enhanced errorsignal and modified microphone signal decreases. Decreased correlationthus can be used as a detector for ADF convergence. For the detection ofconvergence, cross correlation ed_(enr)(n) is normalized by microphoneenergy d_(enr)(n) and compared with the predefined threshold. Since RENC119 cancels background noise also, this normalized cross correlationcheck may pass during no speech region. So, convergence validation ischecked during presence of speech activity using the v_cnt(l).

if ((conv(n − 1) == 0) & & (v _ cnt(l) == 0)) {   if (d_(enr) (n) * β₉ >ed_(enr) (n))   { if ((his _ cnt(n − 1) > β₁₀) & & (adp _ cnt(n) > β₃₇)){ conv(n) = 1 sup _ cnt(n) = β₁₁ m _ cnt(n) = β₃ } else { his _ cnt(n) =his_ cnt(n − 1) + 1 }  }  else (47)  {  if (his _ cnt(n − 1) > β₃₈) { con _ cnt(n) = con_ cnt(n − 1) + his _ cnt(n − 1)   if (con _ cnt(n) >β₁₀) & & (adp _ cnt(n) > β₃₇)   { conv(n) = 1 sup _ cnt(n) = β₁₁ m _cnt(n) = β₃ } } his _ cnt(n) = 0   } }

Decision Logic—309 & 311

FIG. 7 depicts an example of NLP decision logic performed to update NLPdecisions, in elements 309/311 of FIG. 6. The example of FIG. 7 isexemplary and not limiting. A person of ordinary skill can adapt thesedisclosures to other implementations. The decision logic has two mainstages; (1) Decision before convergence and (2) Decision afterconvergence. A Startup Decision Maker 354 is NLP decision maker beforeexpected convergence is achieved. There are five sub-stages in thedecision making after expected convergence is achieved. They aredetailed in the subsequent sub sections.

Startup Decision Maker 354

Startup Decision Maker 354 uses a relaxed threshold and there ispossibility that NLP might be activated sometimes during double talk.The startup decision maker is active for a short time during startup,and thus does not have a major effect on a conversation. Also,occurrence of double talk during start of a call is uncommon.

if ((m _ cnt(n) < β₃) & &(sup _ cnt(n) < β₁₁)  & &(d_(enr) (n) * β₁₂ >ed_(enr) (n))) { nlp(n) = 1 if (v _ cnt(l) == 0) (48) { sup _ cnt(n) =sup _ cnt(n) + 1 } }

Coarse Decision Maker 356

A Coarse Decision Maker 356 uses normalized cross correlationed_(enr)(n)/d_(enr)(n) for decision making. If the validation check ispassed, the DT hangover is broken and ST hangover is set to β₁₄.

if (d_(enr) (n) * β₁₃ > ed_(enr) (n)) { nlp(n) = 1 st _ hngovr(n) = β₁₄(49) dt _ hngovr(n) = −1 distortion(n) = 1 }

Distorted Error Masker

A Distorted Error Masker 358 is an energy comparator for low levelsignal. When the error signal is at a low level and also is much lowerthan the microphone signal level, this decision directs NLP activation.Activating the NLP under such conditions reduces situations wheredistorted low level noise can be heard by the user.

if ((d_(enr) (n) > e_(enr) (n) * β₁₅) & &(e_(enr) (n) < β₁₆)) ∥ (d_(enr)(n) > e_(enr) (n) * β₁₇) & &(e_(enr) (n) < β₁₈)) { (50) nlp(n) = 1 dt _hngovr(n) = −1 }

Coarse Decision Maker 360

A Coarse Decision Maker 360 uses a normalized cross correlationed_(enr)(n)/e_(enr)(n) as a basis for outputting decisions for NLPactivation. If the validation check is passed, the DT hangover is brokenand ST hangover is set to β₂₀ if it is lower than that.

if (e_(enr) (n) > (ed_(enr) (n) * β₁₉)) { nlp(n) = 1 if (st _ hngovr(n)< β₂₀) (51) st _ hngovr(n) = β₂₀ dt _ hngovr(n) = −1 distortion(n) = 1 }

Double Talk Hangover Check

If the NLP decision is OFF with the above validations, a DT HangoverCheck 362 is performed. DT hangover is checked for transmitting thenearend signal passed out of AES until a current point. The hangovercounter is decremented by one for every sample processing.

if (dt _ hngovr(n) > 0) { (52) dt _ hngovr(n) = dt _ hngovr(n) − 1 }

Coarse Decision Maker 365

If all decision making logics failed, then the coarse decision maker 365becomes active (this example shows a serial flow, where any positivedecision causes a NLP=1 decision, and the remainder of the flow need notbe performed. A Coarse decision maker 365 applies a different thresholdon the normalized cross correlation ed_(enr)(n)/d_(enr)(n) based on theconvergence state of the adaptive filter as given below.

if (d_(enr) (n) * β₂₁ > ed_(enr) (n) ∥ ((d_(enr) (n) * β₂₂ > ed_(enr)(n)) & &(conv(n) == 0))) {  nlp(n) = 1 (53)  dt _ hngovr(n) = 0  if(d_(enr) (n) * β₂₃ > ed_(enr) (n)) st _ hngovr(n) = β₂₄ }

The flow of FIG. 7 completes by returning a decision for nlp(n)=0 ornlp(n)=1 to complete the flow of FIG. 6.

NLP Energy Threshold Updating 315

If the NLP Decision Logic enables NLP, then NLP energy threshold isupdated 315 as given below. This threshold will be used for breaking SThangover later.

tmp = e_(enr) (n) − nlp_(enr) (n) if (tmp > 0) nlp_(enr) (n) = nlp_(enr)(n) + tmp * β₂₅ (54) else nlp_(enr) (n) = nlp_(enr) (n) + tmp * β₂₆

Double Talk Hangover Breaker 317

Sometimes there is change of residual echo passed to user due tohangover. So, there should be decision or other mechanism to break DThangover based on a sudden fall in nearend energy or sudden rise in echoenergy. The DT hangover is broken in this scenario based on the belowcondition:

if ((e_(enr) (n) * β₂₇ > d_(enr) (n)) ∥ (d_(enr) (n) > e_(enr) (n) *β₂₈)) { (55) dt _ hngovr(n) = −1 nlp(n) = 1 }

Double Talk Hangover Setting 322

If the DT hangover breaking conditions failed and energy of the errorsignal is more than a predefined threshold, ST hangover is to be brokenand DT hangover is to be set to another pre-defined value, as in theexample below.

if (e_(enr) (n) > β₂₉) { dt _ hngovr(n) = β₂₀ (56) st _ hngovr(n) = −1 }

Single Talk Hangover Breaker 320

The NLP threshold estimated is used for breaking the ST hangover. The SThangover breaking validation condition is given below.

if ((e_(enr) (n) > nlp_(enr) (n) * β₃₀) ∥ (e_(enr) (n) > (nlp_(enr)(n) + β₃₁)) & &(e_(enr) (n) > β₃₂) & &(distortion(n) == 0)) (57) { st _hngovr(n) = −1 }

If the hangover breaking validation is failed and ST hangover count isgreater than 0 (325), NLP is activated (329) and ST hang over count isdecremented by 1 (329).

Refine NLP Decision and ST Hangover 331

Refining the NLP decision and ST hangover are done based on the longterm average amplitude of the reference signal ly(n), absolute averageof error and modified microphone output signal as given below.

if (ly(n) < β₃₃) { nlp(n) = 0 (58) st _ hngovr(n) = −1 } if (e_(abs)(n) > d_(abs) (n) + β₃₄) ∥ (e_(abs) (n) > d_(abs) (n) * β₃₅) & &(d_(abs)(n) > 0) { (59) nlp(n) = 1 st _ hngovr(n) = β₃₆ }

TABLE 3 Constants used by NLP controller 109 Consta

Value Remarks β₁ 0.0189 Noise Energy β₂ 0.1831 Smoothing factor β₃ 64000Max. value of startup indication counter β₄ 0.5 Smoothing factor β₅ 50Constant β₆ 0.03 Smoothing factor β₇ 128 Constant β₈ 480 Max. value ofadap_cn

β₉ 0.0061 Multiplication Factor β₁₀ 1400 his_cnt limit β₁₁ 32000Constant β₁₂ 0.4577 Multiplication Factor β₁₃ 0.0061 MultiplicationFactor β₁₄ 4000 st_hngovr limit β₁₅ 3 Constant β₁₆ 7500 e_(enr) limitβ₁₇ 2 Constant β₁₈ 2500 e_(enr) limit β₁₉ 2 Constant β₂₀ 540 st_hngovrlimit β₂₁ 0.061 Multiplication Factor β₂₂ 0.3662 Multiplication Factorβ₂₃ 0.4577 Multiplication Factor β₂₄ 240 st_hngovr limit β₂₅ 0.097 NLPEnergy β₂₆ 0.0061 Smoothing factor β₂₇ 21845 dt_hngovr limit β₂₈ 0.0313Multiplication Factor β₂₉ 35000 e_(enr) limit β₃₀ 4 Constant β₃₁ 0.2136Multiplication Factor β₃₂ 12000 e_(enr) limit β₃₃ 6400 ly limit β₃₄ 900Constant β₃₅ 8 Constant β₃₆ 1200 st_hngovr limit β₃₇ 300 Constant β₃₈ 20st_hngovr constant K 300 Index

indicates data missing or illegible when filed

Embodiments can be implemented in Fixed Point C on a RISC applicationprocessor, such as an Advanced RISC Machines (ARM) processor, such as anARM 9E. In some implementations, other applications can execute on thesame application processor and in some examples, processes can havepreemptive scheduling provided by an OS kernel, for time-critical tasks.Good performance is shown on real platforms that have general purposeapplication processors, such as laptops, tablets, and desktops, such asMicrosoft Windows desktop, laptop and mobile, as well as Android-basedhandsets. To demonstrate the proposed system's performance here, theensemble average results are provided in this section.

Real-time captured farend and microphone output signals on differentplatforms are fed to the AEC module and respective block's outputsignals are captured and analyzed. FIG. 8 depicts the ensemble averageof ERLE for single talk test case. During single talk test case,microphone output signal has echo and background noise only.

In FIG. 8, it can be seen that ADFs (402) were able to provide ERLE of 8dB only. With the Residual Echo and Noise Canceller (REnNC) 119, ERLEcan be increased up to 60 dB using modified Wiener gain estimation (404)and 40 dB using modified MMSE LSA gain estimation (406). The proposedmethod based on MMSE-LSA provides much less residual noise when comparedto Weiner, while there is no perceptible difference in the enhancedquality of speech between these two methods. Further, residual noisesounds more uniform (more white), which is subjectively preferable.

FIG. 9 depicts the ensemble average of ERLE for Double Talk (DT) testcase. In the FIG. 9, two DT regions are present. In all test cases,there is no clipping of nearend speech and complete cancellation ofbackground noise is observed.

FIGS. 10-13 depict aspects of the performance of an implementation ofthe proposed FAGC. From FIG. 10, it can be noted that the target leveltracking of the proposed FAGC is fast and accurate.

NLP controller 109 performance for real-time captured signal is depictedin FIG. 14. The captured signal has the combination of single talk,double talk and nearend signal. NLP is active during single talk andecho alone regions during double talk and it is deactivated in all thenearend regions. FIG. 15 depicts the AES output for NLP decisions. AESoutput does not contain any residual echo.

Generally, any of the functions, methods, techniques or componentsdescribed above can be implemented in modules using software, firmware,hardware (e.g., fixed logic circuitry), or any combination of theseimplementations. The terms “module,” “functionality,” “component”,“block” and “logic” are used herein to generally represent software,firmware, hardware, or any combination thereof.

In the case of a software implementation, the module, functionality,component or logic represents program code that performs specified taskswhen executed on a processor (e.g. one or more CPUs). In one example,the methods described may be performed by a computer configured withsoftware of a computer program product in machine readable form storedon a computer-readable medium. One such configuration of acomputer-readable medium is signal bearing medium and thus is configuredto transmit the instructions (e.g. as a carrier wave) to the computingdevice, such as via a network. The computer-readable medium may also beconfigured as a non-transitory computer-readable storage medium, whichis not a propagating signal bearing medium (e.g., an EM signalpropagating in free space or over a wire). Examples of acomputer-readable storage medium include a random-access memory (RAM),read-only memory (ROM), an optical disc, flash memory, hard disk memory,and other memory devices that may use magnetic, optical, and othertechniques to store instructions or other data and that can be accessedby a machine, but.

The software may be in the form of a computer program comprisingcomputer program code for configuring a computer to perform theconstituent portions of described methods or in the form of a computerprogram comprising computer program code means adapted to perform allthe steps of any of the methods described herein when the program is runon a computer and where the computer program may be embodied on acomputer readable medium. The program code can be stored in one or morecomputer readable media. The features of the techniques described hereinare platform-independent, meaning that the techniques may be implementedon a variety of computing platforms having a variety of processors.

Those skilled in the art will also realize that all, or a portion of thefunctionality, techniques or methods may be carried out by a dedicatedcircuit, an application-specific integrated circuit, a programmablelogic array, a field-programmable gate array, or the like. For example,the module, functionality, component or logic may comprise hardware inthe form of circuitry. Such circuitry may include transistors and/orother hardware elements available in a manufacturing process. Suchtransistors and/or other elements may be used to form circuitry orstructures that implement and/or contain memory, such as registers, flipflops, or latches, logical operators, such as Boolean operations,mathematical operators, such as adders, multipliers, or shifters, andinterconnects, by way of example. Such elements may be provided ascustom circuits or standard cell libraries, macros, or at other levelsof abstraction. Such elements may be interconnected in a specificarrangement. The module, functionality, component or logic may includecircuitry that is fixed function and circuitry that can be programmed toperform a function or functions; such programming may be provided from afirmware or software update or control mechanism. In an example,hardware logic has circuitry that implements a fixed function operation,state machine or process.

Aspects of the present disclosure encompass software (as represented bydata recorded on a non-transitory medium) which “describes” or definesthe configuration of hardware that implements a module, functionality,component or logic described above, such as HDL (hardware descriptionlanguage) software, as is used for designing integrated circuits, or forconfiguring programmable chips, to carry out desired functions. That is,there may be provided a computer readable storage medium having encodedthereon computer readable program code for generating a processing blockconfigured to perform any of the methods described herein, or forgenerating a processing block comprising any apparatus described herein.

The term ‘processor’ and ‘computer’ are used herein to refer to anydevice, or portion thereof, with processing capability such that it canexecute instructions, or a dedicated circuit capable of carrying out allor a portion of the functionality or methods, or any combinationthereof.

Although the subject matter has been described in language specific tostructural features and/or methodological acts, it is to be understoodthat the subject matter defined in the appended claims is notnecessarily limited to the specific features or acts described above.Rather, the specific features and acts described above are disclosed asexample forms of implementing the claims. It will be understood that thebenefits and advantages described above may relate to one example or mayrelate to several examples.

The actions of the methods described herein may be carried out in anysuitable order, or simultaneously where appropriate and unless indicatedotherwise by context or explicitly. Aspects of any of the examplesdescribed above may be combined with aspects of any of the otherexamples described to form further examples without losing the effectsought.

What is claimed is:
 1. A machine-implemented method of echo cancellationin a full duplex voice communication system, comprising: performingacoustic echo cancellation on a near-end signal using a short timedomain adaptive filter, to produce a filtered near-end signal that mayhave non-linear residual echo and noise; tracking, in the frequencydomain, the non-linear residual echo in the filtered near-end signal tooutput an error signal; producing an estimate of a portion of thefiltered near-end signal to be removed, as a combination of thenon-linear residual echo, and noise; imposing a limitation on theestimate based on a moving average of the error signal; controllinggains associated with a plurality of frequency bins in a Frequencydomain Automatic Gain Controller (FAGC), based on the limited estimate,thereby suppressing the estimated portion of the filtered near-endsignal to be removed; and refining respective gains associated with theplurality of frequency bins of the FAGC.
 2. The method according toclaim 1, wherein the tracking comprises defining frequency binscalculated from the time domain signal by scaling an echo estimateproduced by an acoustic echo canceller performing the acoustic echocancellation.
 3. The method according to claim 2, wherein the timedomain adaptive filter comprises a two-band, 32 millisecond tail lengthecho cancellation filtering unit, and the scaling comprises producing atime domain echo estimate and multiplying the time domain echo estimateby a fixed correlation factor (ŕ′), that is pre-estimated for use withthe echo cancellation filtering unit.
 4. The method according to claim3, further comprising estimating an effect of the echo on a play-outsignal using a moving average filter with low attack rate (α₁) and fastrelease rate (α₂).
 5. The method according to claim 4, wherein theestimating of the effect of the reverberation comprises estimating echofor a plurality of frequency bins, and the echo estimate R_(k)(l) for ak^(th) frequency bin at a frame index (l) is calculated using therelation R_(k)(l)=α₁R_(k)(l−1)+(1−α₁)ŕY_(k)(l) when(ŕY_(k)(l)>R_(k)(l−1)) and R_(k)(l)=α₂R_(k)(l−1)+(1−α₂)ŕY_(k)(l) when(r′Y_(k)(l)≦R_(k)(l−1)) wherein Y_(k)(l) is a Short Time FourierTransform (STFT) of echo estimate y(n) and ŕ is the correlation factor.6. The method according to claim 1, wherein the producing of theestimate of a portion of the filtered near-end signal to be removedcomprises: estimating a level of the noise using an external VoiceActivity Detector (VAD) across the plurality of frequency bins, whereinthe noise estimation for k^(th) frequency bin at frame index l iscalculated using the relation V_(k)(l)=α₃V_(k)(l−1)+(1−α₃)V_(k)(l) when(VAD=0) and V_(k)(l)=V_(k)(l−1) when (VAD=1), where V_(k)(l) is noiseestimation for k^(th) frequency bin and α₃ is a smoothing factor, andcalculating the estimate of the portion to be removed for k^(th)frequency bin at frame index l using the relationNR_(k)(l)=V_(k)(l)+R_(k)(l), where R_(k)(l) is lengthened echo estimate.7. The method according to claim 1, wherein the imposing of thelimitation on the estimate comprises calculating an estimationthreshold, from a frequency domain error signal (E_(k)(l)) using dualalpha low pass Finite Impulse Response (FIR) filtering, wherein theestimation threshold comprises a respective threshold for each of theplurality of frequency bins, and k^(th) frequency bin at frame index lis calculated using the relation W_(k)(l)=α₄W_(k)(l−1)+(1+α₄)*E_(k)(l),for W_(k)(l−1)>E_(k)(l) and W_(k)(l)=α₅W_(k)(l−1)+(1−α₅)*E_(k)(l) for(W_(k)(l−1) E_(k)(l)) wherein α₅ is an attack coefficient and α₄ is arelease coefficient; and limiting the maximum value of the estimatedecho and noise for each of the plurality of frequency bins, wherein forthe k^(th) frequency bin at frame index l, the maximum value iscalculated using the relation P_(k)(l)=minimum of {NR_(k)(l) andW_(k)(l)}, wherein NR_(k)(l) is combined estimated echo and noise andW_(k)(l) is the estimation threshold.
 8. The method according to claim1, wherein the refining of the gains comprises smoothing gain for eachof the plurality of frequency bins, wherein k^(th) frequency bin atframe index l is calculated using the relation G_(k)(l)=α₃*G_(k)^(w)(l−1)+(1−α₃)*G_(k) ^(w)(l−1) when (G_(k) ^(w)(l−1)>G_(k) ^(w)(l))and G_(k)(l)=α₄*G_(k) ^(w)(l−1)+(1−α₄)*G_(k) ^(w)(l−1) if (G_(k)^(w)(l−1)≦G_(k) ^(w)(l)) and G_(k)(l)=α₄G_(k) ^(w)(l) when (l<T) whereinG_(k) ^(w)(l) is instantaneous gain and α₃ is an attack coefficient andα₄ is a release coefficient and T is a gain smoothing threshold; andfiltering the smoothed gain for each of the plurality of frequency bins,wherein for k^(th) frequency bin at frame index l, the filtered smoothedgain is calculated using the relation G_(k)^(F)(l)=(α₅G_(k)(l−1)+α₆G_(k)(l))*(1/(α₅+α₆)) when (k>1), whereinG_(k)(l) is smoothed gain for k^(th) frequency bin at frame index l and(α₅,α₆) are filtering coefficients.
 9. A system for controlling gain ofa near-end signal in a voice enhancement system, the near-end signalbeing a speech signal of a user at a near end, the system comprising: amodule for identifying a plurality of frequency bins which containspeech energy; a local gain estimator operating to estimate local gainfor each of the plurality of frequency bins; a speech identifieroperable to identify bins of the plurality of frequency bins thatcontain speech; a local gain smoothing module operable to smooth gain inthe plurality of frequency bins; and a global gain regulator operable toestimate and regular a global gain over all of the plurality offrequency bins.
 10. The system according to claim 9, wherein the modulefor identifying frequency bins containing speech comprises a processorconfigured to compute the relation bvad_(k)(l)=1 when (G_(k)(l)>λ₁) andvad(l)=1 when (bvad_(k)(l)=1, for any k), wherein, G_(k)(l) is thesmoothed gain and λ₁ is a speech decision threshold and bvad_(k)(l) is aVoice Activity Detector (VAD) decision for k^(th) frequency bin at frameindex l and vad(l) is the VAD decision at frame index l.
 11. The systemaccording to claim 9, wherein the estimation of regulated global gain iscomputed for k^(th) frequency bin at frame index l using the relationG_(r)(l)=(1/sqrt(msqr(l))*(TL), wherein, TL is the target level andmsqr(l) is the estimated target level. Local gain for frame index l iscomputed using the relation G^(M)(l)=(1−λ₅)*G^(M)(l−1)+λ₅*G_(r) ^(M)(l)when (G^(M)(l−1)>G_(r) ^(M)(l)) and G^(M)(l)=(1−λ₆)*G^(M)(l−1)+λ₆*G_(r)^(M)(l) when (G^(M)(l−1) G_(r) ^(M)(l)), wherein G_(r)(I) is theregulated global gain and (λ₅,λ₆) are filter coefficients.
 12. Thesystem according to claim 9, wherein the smoothing of local gain forspeech bins is computed for k^(th) frequency bin at frame index l usingthe relation G_(k) ^(AGC)(l)=(1−λ₇)*G_(k) ^(AGC)(l−1)+λ₇*G^(M)(l) when(G_(k) ^(AGC)(l−1)>G^(M)(l)) and G_(k) ^(AGC)(l)=(1−λ₈)G_(k)^(AGC)(l−1)+λ₈G^(M)(l) when (G_(k) ^(AGC)(l)≦G^(M)(l)) and smooth localgain for noise bins is computed using the relation G_(k)^(AGC)(l)=λ₉*G_(k) ^(AGC)(l−1) when {(G_(k) ^(AGC)(l−1)>1) or (G_(k)^(AGC)(l)>G^(M)(l)} otherwise G_(k) ^(AGC)(l)=G_(k) ^(AGC)(l−1),wherein, G_(k) ^(AGC)(l) is the smoothed local gain for k^(th) frequencybin at frame index l and G^(M)(l) is the local gain at frame index l and(λ₇,λ₈,λ₉) are filter coefficients.
 13. A system for controlling aNon-Linear Processor (NLP) to activate and deactivate the NLP forcomplete removal of residual echo in an echo alone region of amicrophone output signal without chopping of a near-end speech signal,the system comprising: an estimator configured to produce respectiveestimates for a plurality of decision parameters; a detector fordetecting convergence of an adaptive echo cancellation filter; acontroller to output a NLP decision for a frame of speech, wherein theNLP decision indicates whether the NLP is to be active or inactive; amodule for updating an NLP energy threshold parameter; a Single Talk(ST) hangover breaker; a Double Talk (DT) hangover breaker; and a modulefor revising the NLP decision.
 14. The system according to claim 13,wherein the decision parameters consist of a first set of parameters andsecond set of parameters, each selected from the group consisting of:enhanced error signal (e_(n)(n)), modified microphone signal (d′((n)),echo indicator parameter (ed_(enr)(n)), enhanced error signal energy(e_(enr)(n)), modified microphone signal energy (d_(enr)(n)), noisesignal energy (v_(enr)(n)), long term average of reference signalamplitude (ly(n)), absolute error signal (e_(abs)(n)), absolutemicrophone signal (d_(abs)(n)), NLP energy threshold (nlp_(enr)(n)),startup indicator counter (m_cnt(n)), recent noise frame counter(v_cnt(l)), adaptation counter (adp_cnt(n)), suppressor activatedcounter (sup_cnt(n)), hist counter (his_cnt(n)), single talk hangovercounter (st_hngovr(n)), a double talk hangover (counter dt_hngovr(n)),convergence indicator (conv(n)) and a distortion indicatordistortion(n).
 15. The system according to claim 13, wherein theestimator is operable to initialize the decision parameters both duringstartup of the voice enhancement system in which all parameters are setto zero, and during decision making for every near-end signal sample,sets NLP decision at time instant n (nlp(n)) to zero, distortion(n) tozero, sets st_hngovr(n) at time instant n to st_hngover(n−1) and setsnlp_(enr)(n) to nlp_(enr)(n−1).
 16. The system according to claim 13,wherein the estimator is operable to estimate an echo indicatorparameter(ed_(enr)(n)) from a cross correlation between a modifiedmicrophone signal (d′(n)) and an enhanced error signal (e_(n)(n)),wherein the estimator is operable to produce d′(n) by scaling amicrophone signal and to produce e_(n)(n) by scaling an error signalreceived a residual echo remover, and ed_(enr)(n) is computed ased_(enr)(n)=ed_(enr)(n−1)−(d′(n−K)*e_(n)(n−K))+(d′(n)*e_(n)(n)) whereinK is window factor, the estimator is operable to estimate energy of themodified microphone signal (d_(enr)(n)) asd_(enr)(n)=d_(enr)(n−1)−[d′(n−K)*d′(n−K)+d′(n)*d′(n)] and to estimateenergy of the enhanced error signal (e_(enr)(n)) ase_(enr)(n)=e_(enr)(n−1)−[e_(n)(n−K)*e_(n)(n−K)+e_(n)(n)*e_(n)(n)]. 17.The system according to claim 16, wherein the ST hangover breaker isoperable to calculate a noise energy using a moving average filter andthe relation v_(enr)(n)=v_(enr)(n−1)−β₁[e_(enr)(n)−v_(enr)(n−1)] when(e_(enr)(n)>v_(enr)(n−1) andv_(enr)(n)=v_(enr)(n−1)−β₂[e_(enr)(n)−v_(enr)(n−1)] when(e_(enr)(n)≦v_(enr)(n−1).
 18. The system according to claim 13, whereinthe estimator is operable to compute an absolute error signal e_(abs)(n)as e_(abs)(n)=e_(abs)(n)+(|ś(n)|−e_(abs)(n−1))*β₄ when (|d(n−D₁)|<β₅ &d_(abs)(n)<|d(n−D₁)|), otherwisee_(abs)(n)=e_(abs)(n)+(|ś(n)|−e_(abs)(n−1))*β₆ wherein d_(abs)(n) is theabsolute microphone signal, ś(n) is the error signal received from aresidual echo remover, d(n) is a microphone signal and D₁ is a delaycompensation factor between microphone signal d(n) and ś(n), and(β₄,β₅,β₆) are predefined thresholds.
 19. The system according to claim18, wherein calculation of d_(abs)(n) is asd_(abs)(n)=d_(abs)(n)+(|d(n−D₁)|−d_(abs)(n−1))*β₄ when (|d(n−D₁)|<β₅ &d_(abs)(n)<|d(n−D₁)|), otherwised_(abs)(n)=d_(abs)(n)+(|d(n−D₁)|−d_(abs)(n−1))*β₆.
 20. The systemaccording to claim 16, wherein the estimator is operable to calculate adistortion indicator (distortion(n)) by calculating a ratio betweenenhanced error signal energy e_(enr)(n) and echo indicator parametered_(enr)(n) for time instant n and comparing the ratio to a predefinedthreshold β₁₉, calculating a ratio between modified microphone signalenergy d_(enr)(n) and echo indicator parameter ed_(enr)(n) for the timeinstant n and comparing the ratio to a predefined threshold β₁₃, and ifeither comparison is successful, setting distortion(n) to indicatedistortion at time index n.
 21. The system according to claim 14,wherein the second set of parameters comprises m_cnt(n) calculated asm_cnt(n)=m_cnt(n)+1 when (m_cnt(n)<β₃), for every processed sample frommicrophone, v_cnt(l), to indicate a number of recent noise framesobserved and if the current frame is a noise frame, v_cnt(l)=0, when(VAD(l)=Voice); and v_cnt(l)=v_cnt(l−1)+1 if (VAD(l)=Noise), whereinVAD(l) is either Voice or Noise decision from a Voice Activity detector,and v_cnt(l) and v_cnt(l−1) are recent noise frame counter at frameindexes l and l−1 respectively, adp_cnt(n) to indicate number of samplesfor which echo cancellation filters have been adapted,adp_cnt(n)=adp_cnt(n−1)+1 when (ADAP(n)=1); and adp_cnt(n)=adp_cnt(n−1),when (ADAP(n)=0), wherein ADAP(n) is an adaptation indication flagestimated from the double talk detector at time instant n, sup_cnt(n) toindicate a number of samples for which NLP is activated beforeconvergence of the echo cancellation filters calculated by incrementingfor every NLP ON decision before convergence is achieved for a speechframe, and his_cnt(n) that tracks stability of convergence
 22. Thesystem according to claim 21, wherein the detector is operable to updatea convergence indicator conv(n) by calculating a ratio between modifiedmicrophone signal energy d_(enr)(n) and echo indicator parametered_(enr)(n) for the time instant n, comparing the ratio to a predefinedthreshold β₉ when adp_cnt(n)=0, and v_cnt(l)=0, wherein adap_cnt(n) andv_cnt(l) are adaptation counters at time instant n and recent noiseframe counter at frame index l respectively, checking the continuoussuccess in comparison using his_cnt(n), setting conv(n) at time index nto 1, if the continuous successful comparison is more than a predefinedthreshold β₁₀, resetting his_cnt(n) to 0, if the continuous successfulcomparison is not lower than a predefined threshold β₁₀ after the firstfailure in the comparison.
 23. The system according to claim 16, whereinthe NLP controller is operable to make NLP decisions both beforeconvergence of echo cancellation filters and after convergence of echocancellation filters, wherein the NLP controller is operable to makedecisions after convergence of the echo cancellation filter by a coarsedecision based on the ratio between echo indicator parameter ed_(enr)(n)and modified microphone signal energy d_(enr)(n), based ondistortion(n), coarse decision making based on a ratio betweened_(enr)(n) and e_(ner)(n), second level coarse decision making based onthe ratio between ed_(enr)(n) and d_(enr)(n), and breaking a double talkhangover based on amplification in error or attenuation of error morethan an expected threshold, as determined by the ratios.
 24. The systemaccording to claim 23, wherein the NLP energy threshold is calculatedusing the relationnlp_(enr)(n)=nlp_(enr)(n)+(e_(enr)(n)−nlp_(enr)(n))*β₂₅ when enhancederror signal energy e_(enr)(n) greater than NLP energy thresholdnlp_(enr)(n), (otherwise,nlp_(enr)(n)=nlp_(enr)(n)+(e_(enr)(n)−nlp_(enr)(n))*β₂₆, whereine_(enr)(n) is enhanced error signal energy and (β₂₅,β₂₆) are predefinedthresholds.
 25. The system according to claim 14, wherein the SThangover breaker is operable to break an ST hangover, and resetst_hngovr(n), based on one or more of the following conditions beingfound to exist: a ratio between enhanced error signal energy e_(ner)(n)to NLP energy threshold nlp_(enr)(n) is greater than a predefinedthreshold (β₃₀) when enhanced error energy is greater than β₃₂ anddistortion(n) indicates no distortion, enhanced error signal energye_(enr)(n) is greater than NLP energy threshold nlp_(enr)(n) by apredefined threshold β₃₁ when error energy is greater than β₃₂ anddistortion(n) indicates no distortion, and a long term average of thereference signal ly(n) is greater than a predefined threshold β₃₃. 26.The system according to claim 14, wherein the DT hangover breaker isoperable to break an DT hangover, and reset dt_hngovr(n), based on oneor more of the following conditions being found to exist a ratio betweenecho indicator ed_(enr)(n) and modified microphone signal energyd_(enr)(n) lesser than predefined threshold β₁₃, a ratio betweend_(enr)(n) and enhanced error signal energy e_(enr)(n) is greater thanpredefined threshold β₁₅ when error signal is below the predefinedthreshold β₁₆, a ratio between d_(enr)(n) and e_(ner)(n) is greater thanpredefined threshold β₁₇ when error signal is below the predefinedthreshold β₁₈, a ratio between ed_(enr)(n) and e_(enr)(n) is less than apredefined threshold β₁₉, a ratio between ed_(enr)(n) and d_(enr)(n)less than a predefined threshold β₂₁ when conv(n) is zero, a ratiobetween ed_(enr)(n) and d_(enr)(n) is greater than predefined thresholdβ₂₂ when conv(n) is zero, a ratio between d_(enr)(n) and e_(enr)(n) lessthan a predefined threshold β₂₇, and a ratio between d_(enr)(n) ande_(enr)(n) is less than a predefined threshold β₂₈.
 27. The systemaccording to claim 26, wherein if any condition is found to exist, thenthe remaining conditions are not checked, and double talk hangover isbroken.
 28. The system according to claim 14, wherein the DT hangoverbreaker is operable to break an DT hangover, and reset dt_hngovr(n),based on detecting that the enhanced error signal energy e_(enr)(n)greater than a predefined threshold β₂₉.
 29. The system according toclaim 14, wherein the ST hangover breaker is operable to break an SThangover, and reset st_hngovr(n), based on one or more of the followingconditions being found to exist, and wherein if one condition is foundto exist, the remaining conditions are not checked a ratio between echoindicator ed_(enr)(n) and modified microphone signal energy d_(enr)(n)is greater than a predefined threshold β₁₃, a ratio between enhancederror signal e_(enr)(n) and ed_(enr)(n) greater than a predefinedthreshold β_(19v) and single talk hangover counter is less than β20, aratio between ed_(enr)(n) and d_(enr)(n) greater than a predefinedthreshold β₁₃, and a ratio between d_(enr)(n) and ed_(enr)(n) greaterthan a predefined threshold β₁₉.
 30. The system according to claim 14,wherein the ST hangover breaker setting is based on a plurality ofparameters and predefined group of validation conditions and whereinpresence of at least one validation condition is sufficient to setsingle talk hangover and the system avoids checking remaining validationconditions.
 31. The system according to claim 14, wherein the SThangover validation conditions comprise checking whether ratio betweenmodified microphone signal energy denr(n) and echo indicator edenr(n)greater than a predefined threshold β21 and the ratio between echoindicator edenr(n) and modified microphone signal energy denr(n) greaterthan a predefined threshold β23; checking whether the absolute shortterm average error signal eabs(n) greater than absolute short termaverage microphone signal dabs(n) by predefined threshold β34 whendabs(n) is greater than zero; and checking whether ratio between theabsolute short term average error signal eabs(n) and absolute short termaverage microphone signal dabs(n) greater than predefined threshold β35when dabs(n) is greater than zero.
 32. The system according to claim 14,wherein the ST hangover validation conditions comprise checking whetherratio between modified microphone signal energy denr(n) and echoindicator edenr(n) greater than a predefined threshold and the ratiobetween echo indicator edenr(n) and modified microphone signal energydenr(n) greater than a predefined threshold; checking whether theabsolute short term average error signal eabs(n) greater than absoluteshort term average microphone signal dabs(n) by predefined threshold β34when dabs(n) is greater than zero; and checking whether ratio betweenthe absolute short term average error signal eabs(n) and absolute shortterm average microphone signal dabs(n) greater than predefined thresholdβ35 when dabs(n) is greater than zero.
 33. The system according to claim12, wherein a module for revising the NLP decision is operable to setthe NLP decision to zero when long term average of reference signalamplitude ly(n) is less than predefined threshold β33 set the NLPdecision to one when the absolute short term average error signale_(abs)(n) greater than absolute short term average microphone signald_(abs)(n) by predefined threshold β₃₄ when d_(abs)(n) is greater thanzero; and set the NLP decision to one when the ratio between theabsolute short term average error signal e_(abs)(n) and absolute shortterm average microphone signal d_(abs)(n) greater than predefinedthreshold β₃₅ when d_(abs)(n) is greater than zero.